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Abstract 

Background: In network meta-analyses, several treatments can be compared by connecting evidence from clinical 
trials that have investigated two or more treatments. The resulting trial network allows estimating the relative effects 
of all pairs of treatments taking indirect evidence into account. For a valid analysis of the network, consistent 
information from different pathways is assumed. Consistency can be checked by contrasting effect estimates from 
direct comparisons with the evidence of the remaining network. Unfortunately, one deviating direct comparison may 
have side effects on the network estimates of others, thus producing hot spots of inconsistency. 

Methods: We provide a tool, the net heat plot, to render transparent which direct comparisons drive each network 
estimate and to display hot spots of inconsistency: this permits singling out which of the suspicious direct 
comparisons are sufficient to explain the presence of inconsistency. We base our methods on fixed-effects models. 
For disclosure of potential drivers, the plot comprises the contribution of each direct estimate to network estimates 
resulting from regression diagnostics. In combination, we show heat colors corresponding to the change in 
agreement between direct and indirect estimate when relaxing the assumption of consistency for one direct 
comparison. A clustering procedure is applied to the heat matrix in order to find hot spots of inconsistency. 

Results: The method is shown to work with several examples, which are constructed by perturbing the effect of 
single study designs, and with two published network meta-analyses. Once the possible sources of inconsistencies are 
identified, our method also reveals which network estimates they affect. 

Conclusion: Our proposal is seen to be useful for identifying sources of inconsistencies in the network together with 
the interrelatedness of effect estimates. It opens the way for a further analysis based on subject matter considerations. 

Keywords: Network meta-analysis, Inconsistency, Cochran's Q, Hat matrix 



Background 

Evidence from various treatment comparisons in differ- 
ent randomized trials can be combined by a network 
meta-analysis. This method not only aggregates evi- 
dence from direct comparisons, but also involves indirect 
comparisons, i.e. relative effect inferences for previously 
observed or not observed contrasts. References [1-4] give 
an overview of the recent methodological development. 
The validity of a network meta-analysis and, in particular, 
that of the indirect comparisons depends on a consis- 
tent network of treatment effects. However, there might 
be specific treatment effects in the network that lead to 
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inconsistency, e.g. due to being based on studies with 
divergent patient or trial characteristics [5,6] or due to bias 
[7]. Perturbed treatment effects can strongly affect other 
network estimates, which induces further inconsistency 
between direct and indirect estimates. This calls for tools 
that can identify the flow of evidence in the network, i.e. 
that can highlight direct comparisons that strongly drive 
other treatment effect estimates and hot spots of network 
inconsistency. 

In this context, inconsistency means disagreement 
between direct and indirect evidence that can occur in 
addition to heterogeneity between studies with the same 
treatment arms. A network meta-analysis can be visual- 
ized by a graph, whereby the set of nodes corresponds to 
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the considered treatments and the edges display the treat- 
ment comparisons of all included trials. If correspond- 
ing treatment effect estimates of various connections, or 
so called paths, differ between two treatments, there is 
inconsistency. Since the start and end point for different 
alternative network paths are the same, inconsistency can 
only be detected in such network loops [8,9]. It is not pos- 
sible to trace inconsistency back to a single comparison in 
a network that only includes one loop, but comparisons 
that are included in several loops may be identifiable as a 
unique source for a hot spot of inconsistency. 

In the following, we therefore provide methods for iden- 
tifying such hot spots, which might consist of loops, 
parts of loops or even just single comparisons. We also 
investigate the influence of individual comparisons on the 
network estimates that might drive further perturbation 
and invalid network estimates due to the network design. 

Different approaches to assess inconsistency have been 
discussed. The series of Technical Support Documents 
produced by the NICE Decision Support Unit [10] pro- 
vides a detailed review of methods on this topic. The 
oldest method to assess inconsistency is to examine it in 
three-treatment loops [11]. For example, Cipriani et al. 
[12] apply it to every three-treatment loop in the net- 
work. While including larger loops as well, Salanti et al. 
[6] systematically repeat the method for every loop in the 
network. Another method to assess inconsistency is to 
set up a mixed model with a sparse covariance structure 
that allows for one extra variance component to capture 
inconsistency; this was performed in a classical likelihood 
framework [8] and in a Bayesian framework [9,13]. 

Finally, consistency can be assessed by comparing a 
model that satisfies only some consistency restrictions 
(or no restrictions at all) with the consistency model. 
The node-splitting method [14] extends the consistency 
model by only one parameter that captures the differ- 
ence between a contrast, e.g. treatment A versus treat- 
ment B, that is assessed in all direct comparisons and 
the same contrast assumed to be valid from the indi- 
rect evidence. Unfortunately, the definition of the indirect 
evidence is not quite clear for multi-armed studies, and 
the node-splitting methods were recognized as depend- 
ing on the choice of reference treatment in multi-armed 
studies [15,16]. Recently, Higgins et al. [15] and White 
et al. [16] have set up a modeling paradigm where studies 
are distinguished by design, i.e. by the full set of treat- 
ments compared. In this case, the effect of a contrast, e.g. 
between treatment A and treatment B, may differ in the 
full inconsistency model depending on being estimated 
in two-armed studies or e.g. in three-armed studies con- 
taining additionally treatment C, or treatment D. In their 
model, inconsistency is no longer a violation of some 
equations that reflect loops, but rather model parameters 
reflecting design-by-treatment interactions. Lu et al. [9] 



and White et al. [16] have used the term inconsistency 
degrees of freedom for the difference in the number of 
parameters between the full inconsistency model and the 
consistency model, but Lu et al. [9] denned them with- 
out distinguishing direct evidence from two-armed and 
multi-armed studies. 

Lu and Ades [9] gave preference to a Bayesian approach 
and favored random-effects models that include inconsis- 
tency factors as random effects. Senn et al. [17] cautioned 
against random-effects analysis and pointed out (as did 
[18]) that in fixed-effects models with variances assumed 
to be known, a Cochran-type chi-squared statistic results 
for the overall heterogeneity in the network. Caldwell et al. 
[19] proposed a chi-squared statistic for testing the con- 
sistency of independent network paths between one pair 
of treatments. White et al. [16] proposed a global Wald 
chi-squared test for all design-by-treatment interaction 
parameters (treated as fixed effects), applied in a model 
with random effects for heterogeneity within designs that 
is fitted via restricted maximum likelihood method. This 
test may lack power by implicitly attributing part of the 
inconsistency to heterogeneity. 

In this paper, we will define another global chi-squared 
test for inconsistency that results by comparing a fixed- 
effects model for inconsistency with a consistency model. 
It will emerge as a part of the decomposition of Cochran's 
Q statistic into components accounting for heterogeneity 
among studies sharing the same design and inconsistency. 

Once inconsistency has been assessed globally, means 
are needed to find its sources. Senn et al. [17] inspected 
the squared Pearson residuals at the study level, which 
sum up to the overall Q chi-squared statistic. The 
design-by-treatment interaction parameters introduced 
by [15,16] may be used to spot inconsistency. Unfortu- 
nately, the definition of these parameters relies on the 
ordering of treatments. More generally, all regression 
diagnostic methods (see [20]) can be applied. References 
[21,22] have discussed this for classical meta-analysis and 
[23] for network meta-analysis. There are some attempts 
to visualize network meta-analysis for assessing hetero- 
geneity, including inconsistency [6,9,14,17,23,24]. None of 
these have met with general acceptance as yet, and they 
do not address the needs as well as the forest plot does 
in classical meta-analysis, which simultaneously discloses 
each study's weight and deviation from the pooled esti- 
mate. Notably, for this purpose, the Galbraith plot [25], 
although far less commonly used, is even better suited. 

In the following, we systematically develop a graphi- 
cal tool for highlighting hot spots of inconsistency by 
considering the detailed change in inconsistency when 
detaching the effect of studies with the same treatment 
arms. Furthermore, we identify drivers for the network 
estimates. Highlighting of inconsistency will provide more 
information than just singling out inconsistent loops. We 
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provide a matrix display that summarizes network drivers 
and inconsistency in two dimensions, such that it may 
be possible to trace inconsistency back to single deviat- 
ing direct comparisons. Naturally, it is difficult to dis- 
play detailed network properties in just two dimensions, 
but we propose a clustering approach that automatically 
groups comparisons for highlighting hot spots. 

Section "Methods" provides a detailed description of the 
different building blocks of our proposal: We present a 
fixed-effects model for network meta-analyses within the 
framework of general linear models with known variances 
in Section "Parameterization and two-stage analysis of a 
fixed-effects model in network meta-analysis". Based on 
this model, we discuss the resulting hat matrix in Section 
"Identifying drivers via the hat matrix", which we use as 
an instrument for identifying drivers. We suggest using a 
chi-squared statistic for the heterogeneity in the network, 
which we decompose into a test statistic for the incon- 
sistency and a test statistic for the heterogeneity within 
groups of studies, classified according to which treatments 
are involved. A graphical tool that visualizes the network 
drivers and inconsistency hot spots is given in Section 
"Identifying hot spots of inconsistency". Specifically, we 
use the inconsistency information along with detach- 
ing of single component meta-analyses to locate incon- 
sistency hot spots. All the steps in Section "Methods" 
are illustrated using artificial examples. Section "Results" 
then provides results for two published network anal- 
yses. Finally, we discuss our methods and results in 
Section "Discussion", and we provide concluding remarks 
in Section "Conclusions". 

Methods 

In the following, we provide a fixed-effects model for 
network meta-analyses, on which we base our fur- 
ther analysis. We present tools to identify hot spots of 
inconsistency in the network and drivers with a high 
impact on network estimates. Using these two tools, we 
provide a graphical display to locate potential sources 
of inconsistency. 

Parameterization and two-stage analysis of a fixed-effects 
model in network meta-analysis 

We consider a network meta-analysis with T + 1 treat- 
ments A$, . . . , At, under which A$ represents a reference 
treatment. A total of 5 studies compares these treat- 
ments, such that a graphical representation of the com- 
parison network with treatments as nodes and edges 
linking treatments directly compared in some studies cre- 
ates a connected graph (see e.g. Figure la). We summarize 
all studies s (s = 1, . . . , 5) in the set <S, classify each 
study by the number of included treatments N s and by a 
design index d = 1, . . . ,D according to which treatments 
are respectively involved (see [15,16,23] for a similar 



approach). We define Sd as the subset of studies with the 
same design d that includes Nd different treatments. 

For a fixed-effects analysis, this network can be written 
in matrix notation as the following general linear model 
with heteroscedastic sampling variances: 



Y = X0 net + 6. 



(1) 



Y is a vector of observed treatment effects of all 5 stud- 
ies, e.g. log odds ratios for a binary outcome and the 
design matrix X with T columns contains the structure 
of the network at the study level. For all studies of one 
design, we choose the same reference treatment. Assum- 
ing a consistent network, we estimate the vector of basic 
parameters 0 net (in terminology of [9]) corresponding to 
the treatment effects of all T comparisons to the reference 
treatment. By considering linear combinations of them, 
we can then infer all other effects of the network. The vec- 
tor 6 comprises all error terms of the model with E(e) = 0 
and known covariance matrix V, which has a diagonal 
form. The length of vector Y and e as well as the number 
of rows of X and V depend on the number and design of 
the included studies. Each two-armed study provides one 
entry to Y, one entry to the diagonal of V, and one row to 
X. We deal with the case of multi-armed studies separately 
in Section "Multi-armed studies". 

For exemplifying the model components, we con- 
sider a simple example of a network meta-analysis with 
three treatments A0.A1.A2 (T = 2) and four observed 
studies (5 = 4): two studies (s = 1, 2) for comparison 
A\ versus Aq (d = 1), one study (s = 3) for com- 
parison A2 versus Aq (d = 2), and one study (s = 4) 
for comparison A2 versus A\ (d = 3). Then the basic 
parameters are O^.f and O^.f for the contrast of treat- 
ment Ai versus Aq (d = 1, named 0:1) and the con- 
trast of treatment A2 versus Aq (d = 2, named 0:2). 
Under the consistency assumption, it follows that the 
effect 0$ = 0£|* - 0£f . Let Y s be the observed effect and 
V s is the corresponding sampling variance in study s. We 
then have: 



Y = 



(Y 1 \ 




1 0" 


Y 2 


, x= 


1 0 


Y 3 


0 1 






-1 1 



, v= 



Vi 0 0 0 

0 v 2 0 0 

0 0 V3 0 

0 0 0 V4 



The vector of the basic parameters 0 net can be esti- 
mated in a classical frequentist manner by generalized 
least squares as follows: 



<9 net = (x f y- l x)- l x f y- l Y, 



(2) 



which is sometimes referred to as the Aitken estimator 
[26]. 

This estimation can equivalently be performed in two 
steps (as discussed in [18,23]). First, D meta-analyses 
with inverse variance weighting summarization provide 
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pooled estimates and their variances per study design. 
Secondly, model (1) deals with the results of these compo- 
nent meta-analyses just as with single study observations. 
The inverse variance weighting estimation of the first step 
is as follows: 



ndir 



\seS d J seS d 



(3) 



(4) 



\seS d 



Thus, evidence of all studies with the same treatment 
arms (s e Sd) is initially summarized, resulting in esti- 
mated treatment effects #4 ir and covariance matrices V$ ir 

a a 

of so-called direct comparisons, since these comparisons 
are actually observed. In the second stage of the estima- 
tion, a linear model is fitted to the effect vector 6 dir := 
(Of 



dir 



summarized direct comparisons: 



§ & * = x a e D 



(5) 



V a . The covariance 



with E(e a ) = 0 and Cov(e a ) =: 
matrix is given by V a := diag(Vf ir , . . . , V^ ir ) and is the 
compressed design matrix containing one set of rows for 
each design. In the case of two-armed studies, the design 
matrix X a is formed by stacking one row over each of the 
others rows for each type of design. In the example above 
we have: 



X a = 



1 0 
0 1 
-1 1 



Multi-armed studies 

We distinguish each set of multi-armed studies sharing 
the same set of treatments as a different design. That 
means that if we add a three-armed study for A 2 versus 
A\ versus Aq to the example above, we consider a further 
design (d = 4). 

Since the effects observed in one multi-armed study 
cannot be inconsistent, we use one design-specific treat- 
ment as a study reference for each multi-armed study, 
e.g. Aq in all studies comparing A2 versus A\ versus Aq. 
Then, a study with M + 1 arms adds to the vector Y of 
model (1) a vector Y s of M treatment effects for each 
comparison to the reference. In our example we have the 
vector Y s = (Yo:i> YoaY of comparison A\ versus Aq and 
comparison A2 versus Aq. Furthermore, the multi-armed 
study gives M rows for X with the corresponding con- 
trasts. Since pairwise treatment effects of one study are 
correlated, the multi-armed study adds a block V s of size 
M x M for the covariance matrix V of the sampling error e 
(compare to [13,27]). In the case of multi-armed studies of 
design d, a summarized treatment effect 6f r is a vector of 



length M with covariance matrix V^ ir of sizeMxM. These 
summarizations can be calculated in accordance with the 
equations (3,4) [28] and can be used as observations in 
model (5). The design matrix X a contains then M rows 
for the corresponding design of the studies. This means in 
the simple example above with four two-armed and one 
three-armed study (D=4) that: 

1 0^ 

0 1 

X a = -11 

1 0 
0 1 

Identifying drivers via the hat matrix 

In linear models, the hat matrix contains the linear coeffi- 
cients that present each predicted outcome as a function 
of all observations. Its diagonal elements are known as 
leverages. They summarize the importance of the respec- 
tive observation for the whole estimation. Observations 
with both high leverage and large residual are recognized 
as being highly influential [29] . 

In the context of network meta-analyses and model (5), 
the hat matrix is: 

H := X a {X a V- x X a T l X' a V-\ (6) 

Its rows are the linear coefficients of 6^ ir (d = 1, . . . ,D) 
for the network estimate 0j, et , where 6^, et is the subvec- 
tor of 0 net corresponding with design d ' . The coefficients 
are a generalization of the study weights in simple meta- 
analyses but do not sum up to one. They have values 
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Figure 1 Network design and hat matrix of an illustrative 
network meta-analysis. In a), the network design of an illustrative 
example is given: six treatments and eight different observed designs 
based on two-armed studies. The nodes correspond to the 
treatments, and the edges show which treatments are directly 
compared. The thickness of an edge represents the inverse standard 
error (Vf r )~ ] / 2 , which is equal one for all designs. In b), the resulting 
hat matrix at the design level is given in percent, which indicates the 
contribution of the direct estimate in design d (shown in the column) 
to the network estimate in design d' (shown in the row). In addition, 
the absolute values of the matrix elements are visualized by the area 
of the gray squares. 
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between -1 and 1. While in simple meta-analyses the con- 
tribution of a study (or weight of a study) to the pooled 
estimate is proportional to the precision of the study, 
in network meta-analyses the contribution of direct esti- 
mates to a network estimate is a function not only of its 
precisions, but also of the network structure. Depending 
on the agreement of direct and indirect evidence, a large 
absolute entry in H indicates a strong influence of the 
respective direct estimate. Note that H is not necessarily 
symmetric, and for multi-armed studies the choice of the 
reference treatment affects the corresponding coefficients. 

In network meta-analyses, the diagonal elements of H 
have a special role. In a connected network, the trace of H 
equals T, the number of parameters of model (5). In fact, 
each network estimate can be written as a weighted mean 
of a direct estimate which is based on all comparisons 
involving only the given two treatments and the indirect 
estimate which is based on all other studies. The diagonal 
element of H is identical to the weight of the direct esti- 
mate in this presentation. Different than in many regres- 
sion applications, the off-diagonal elements of H deserve 
special attention in network meta-analyses. The smaller 
the diagonal element, the more weight is given to indirect 
evidence. This is also discussed in [18]. The off-diagonals 
indicate which study designs contribute in an essential 
way to the indirect part of the network estimate. 

As an illustration of the hat matrix, we use an example 
of a network meta-analysis with six treatments (T = 5) 
and eight different observed designs (D = 8) based on 
two-armed studies (Nd = 2 for all d = 1, . . . , 8). The cor- 
responding network is shown in Figure la), where the 
nodes correspond to the treatments and the edges show 
which treatments are directly compared. The thickness of 
an edge represents the inverse standard error (V^ ir ) -1 / 2 , 
which is equal to one for all d in our example (V a = h, 
where ^8 is the identity matrix of size eight). For one design 

— 1/2 

there might, for example, be one study with V s =1 
or 100 studies with V^ 1 ^ 2 = 0.1. The resulting hat matrix 
at the design level is given in percent in Figure lb). In 
addition, the absolute values of the matrix elements are 
visualized by the area of the gray squares. 

The diagonal squares indicate that the network esti- 
mates are predominantly driven by their corresponding 
direct estimates, all more than 50%. The diagonal squares 
are the largest for the edges 1:6 and 3:4 that intercede 
between the two triangles. Their direct estimates drive 
70% of their network estimates. The smallest diagonal 
squares are seen for the edges 1:3 and 4:6 (direct estimates 
drive 53%), since the latter ones are paralleled by two inde- 
pendent indirect paths and the former ones only by one. 
Inspecting the off-diagonal squares, we learn that aside 
from its direct estimates, the network estimates O^.f and 
O^.f are driven by the other corresponding direct estimate 



and then by Of}^. Due to symmetry, the same holds for the 
edges involved in the triangle {4:5, 5:6, 4:6}. 

Identifying hot spots of inconsistency 
Decomposition of Cochran's Q 

An important aspect in meta-analysis is to investigate sta- 
tistical heterogeneity. In network meta-analysis inconsis- 
tency arises as another aspect of heterogeneity. In a clas- 
sical meta-analysis comparing two treatments, Cochran's 
Q [30] is a well-accepted tool for assessing heterogeneity 
between studies, which is seen to be the sum of squared 
Pearson residuals. We use the generalized Cochran's Q 
statistic for multivariate meta-analysis [27,31] in the con- 
text of network meta-analyses: 

Q net := (y - _ XO^YV-^Y - X6 net ). (7) 

To examine the heterogeneity of the whole network in 
more detail, particularly considering the inconsistency in 
the model, we decompose the Q net statistic into two parts 
(similar to [32] who used a decomposition by study group 
in the context of classical meta-analysis): 

Q net = Q het + Q inc (g) 

The first is a sum of within-design Q statistics 

D 

Q het :=J]Qhet with (9) 

d=l 

qT := E ^ - ofyvr^Y* - ^ dir )- (10) 

seS d 

The second is a between-designs Q statistic 

Q inc , = ( £dir _ x jnetyy-10dir _ X J*«). (11) 

The heterogeneity of the whole network can be assigned 
to the heterogeneity between studies by Q het , related to 
each design d with Q^ et > and otherwise to the inconsis- 
tency of the network by Q inc . Under the null hypothesis 
for both homogeneity and consistency, all Q statistics 
((7), (9), (10), (11)) are approximately chi-squared dis- 
tributed with respective degrees of freedom given in 
Table 1. Thereby, the degrees of freedom of the chi- 
squared distribution corresponding to Q inc are identical 
to those defined in [16]. All Q statistics are independent 
of the choice of design-specific reference treatment. 

For example, for the network design in Figure la) 
we assume inconsistent treatment effects by (Of^jOf^, 
0$£, . . . , O^Y = (5, 0, 0, ... , 0)', where each component 
meta-analysis corresponds to one study. The perturbation 
effect of five means that the contrast differs by five stan- 
dard errors of a direct effect estimate. This may be a lot 
if the precision of component meta-analysis is small. This 
effect was chosen here in order to achieve a reasonable 
power for illustration purposes. 
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Table 1 The network Q statistics and the degrees of freedom of their corresponding chi-squared distribution 



Null hypothesis 


Q statistic 


Degrees of freedom 


Homogeneity in the whole network 


Q net 




Homogeneity within designs 


Q het 


df Qhet := J2d=] 


Homogeneity within design d 


Q het 


= (^ € S d (N 5 -1))-N d + 1 


Consistency between designs 


Qinc 


(Ed=i (A/d-1))-T 


Consistency between designs after 


f) inc 


df Q inc := df Q inc — Nd + 1 


detaching the effect of design d 






In a network with T + 1 treatments and a set of studies S,d = 1 , 


. . , D designs are observed. The set of studies with design d is given by Sd- The numbers of 



treatments in study s and in design d are given by N s and A/<y respectively. 



In real applications, the power may be small [33] and 
a failure to detect inconsistency does not automatically 
imply consistency. Note, however, that a deviating effect 
cannot be absorbed into a heterogeneity variance com- 
ponent, other than in random-effects models. Depending 
on the number of studies that inform a design, a single 
deviating study may inflate either Q inc or Q het . That is 
why inconsistency and heterogeneity must be considered 
jointly. 

As network estimates, we obtain in the example (O^.f, 

/□net /met /met /met /met /met nnet\f _ /o i £7 1 000 a £fin 
^1:3 ' ^1:6 ' ^2:3 ' ^3:4 ' ^4:5 ' ^4:6 ' ^5:6 ' ~~ {0.1.0/ , l.SSS, U.DUU, 

-1.834, -0.500, -0.166, -0.333, -0.167)'. With this, an 
inconsistency statistic Q inc = 3.36 + 1.78 + 0.25 + 3.36 + 
0.25 + 0.03 + 0.11 + 0.03 = 9.17 results that is chi-squared 
distributed with 8 — 5 = 3 degrees of freedom. Since there 
cannot be heterogeneity between studies, in this example 
Q inc and d/qmc are identical to Q net and d/qnet. 

If some of the component meta-analyses are het- 
erogeneous, the others can still validly be tested by 
their Q^ et . Even Q inc has some interpretation in this 
case: The direct estimates are estimates of the inverse 
variance-weighted averages of different true but unknown 
study- specific treatment effects. Then, Q inc with the 
same reference distribution provides a valid test of the 
hypothesis of consistency of these averaged treatment 
effects. 

Detaching a single design 

Once inconsistency is indicated by a large Q inc , formula 
(11) can be used to assess the contribution of each com- 
ponent meta-analysis of design d to the inconsistency. 
In fact, Q inc is the sum of quadratic forms of residuals 
over all designs. For simple comparisons between two 
treatments, the summands are squared Pearson residu- 
als. Unfortunately, a deviating effect of one component 
meta-analysis can simultaneously inflate several residuals. 
Therefore, we fit a set of extended models allowing for 
a deviating effect of each study design in turn and recal- 
culate the Q statistic. This procedure is equivalent to a 
leave one out' approach: Once per fit, studies with one 



design are left out of the network estimate to obtain an 
independent estimate of the treatment effect in design d 
and to obtain a network model fit independent of stud- 
ies with design d (for another leave one out approach, 
see [14]). 

More formally, we modify model (5) by inserting Nd — 1 
new parameters ^ ir " ind into the parameter vector 0^ := 
(6 net , ^ ir " ind ) for all pairwise treatment comparisons in 
design d to the design-specific reference. The design 
matrix of the new model needs an extra column for 
each new parameter, here notated as set of indicator 
vectors with entry one for each pair- wise com- 

parison in d and entry zero for all other comparisons. 
So we add Nd — 1 columns for each design with Nd 
treatments. Each additional column corresponds to one 
of the non-reference- treatments. We have the following 
model: 

§te = (x a ,i [d} )e$ + €, (12) 

with E(e) = 0 and Cov(e) = V a as previously. In 
this model, the parameters 0 net capture all network evi- 
dence without the information from studies with design d, 
and the parameters 6^ ir ~ md denote the difference between 
direct and indirect effect estimate in design d. The lat- 
ter is called a design-by-treatment interaction in White 
et al. [16], but in contrast to White et al, we only add 
extra columns for one design at a time. Remaining incon- 
sistency in this model can be tested by the corresponding 
Q statistic: 

QSS :=/J, W^(* (13) 

that is chi-squared distributed with df^^ degrees of free- 
dom (see Table 1). Here, the vector 

R ( d):=6 dk -(X a ,l { d})0$ (14) 
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of length Yl,d=i^d — 1) contains the residuals that are 
identical to those of a consistency model fitted after hold- 
ing out design d. And in the case of design d, the residuals 
equal zero. 

For illustration purposes, we successively introduce one 
new parameter for each of the eight possible detachments 
of one component meta-analysis into the inconsistent net- 
work example from Section "Decomposition of Cochran's 
Q" corresponding to Figure 1. For design 1:2, we use 
a parameter vector extended by Of}™^ in model (12); 
in combination with the design matrix (X a , (1, 0, ... , 0)0- 
With 0^ 2) = (5, 0, . . . , 0)' this results in Q^ 2) = 0 + • • • + 
0 = 0 that is chi-squared distributed with two degrees of 
freedom. For design 1:3, we respectively obtain Qq C 3 ) = 
1.15 + 0.00 + 1.15 + 1.15 + 1.15 + 0.13 + 0.50 + 0.13 = 
5.36 that is also chi-squared distributed with two degrees 
of freedom. 

Finally, to locate the inconsistency in the network, we 
compare the remaining inconsistency after exclusion of 
design d studies to the inconsistency before exclusion for 
all designs d' = 1, • • • ,D by: 



1 




Figure 2 Network design of an illustrative network 
meta-analysis. The nodes correspond to eight treatments and the 
edges display observed treatment comparisons. Design 6:7 and 3:4 
do not contribute to the inconsistency assessment and are not 
incorporated into a net heat plot. 



Q% ■■= QT - Q7 (dy (is) 

Here, 

Qj? C : = " X d'O net Y Vf {Of - X d ,0^) (16) 

is the summand in Q inc belonging to design d' (it is 
E?'=i QT = Q inC )> and QT{d) is the corresponding part 
from Q 1 /^ and model (12). Since R(d)d' = 0 for all d' = d; 
it follows that in this case QjF d = Q^ c . In other words, 
Q^F d is the reduction of the squared standardized resid- 
ual Q^) c for design d' due to elimination of design d 
studies. 

In the example, holding out design 1:2 results in a per- 
fect fit of model (12) and we obtain Q$* 2 = for 
all ^ in {1:2,...5:6} (Q?jf 1:2 = 3.36, Qf% 1:2 = 1.78,...) 
since Q^. 2 ) = 0. F° r the detachment of the component 
meta-analysis with design 1:3, we obtain Q^fis — 2.21, 
Q^ 1:3 = 1.78, and so on. 

The net heat plot 

For a graphical inspection of network inconsistency, 
we use a color visualization of the quadratic matrix 
(Q^ d )d',d=\,...,D> which we call a net heat plot in the fol- 
lowing. Warm colors in this plot (yellow over orange 
to red) indicate a positive A negative is 

illustrated by blue colors. Because of the non-negative 
scalars Q^ c on the diagonal of the matrix, which sum 



up to the Q inc statistic, the corresponding diagonal ele- 
ments of the plot have non-blue colors. Warm colors on 
the off-diagonal of the plot indicate that a detachment 
of the component meta-analysis with design d (shown 
in the columns) reduces the inconsistency at design d' 
(shown in the rows). The inconsistency between direct 
and indirect evidence at design d' before the detach- 
ment is indicated by the color of the diagonal element 
d An increase in inconsistency is indicated by blue 
colors. The stronger the intensity of the color is, the 
greater the difference between the inconsistency before 
and after the detachment of studies with design d is. The 
color of the whole plot is implemented to have a max- 
imum intensity for absolute values greater or equal to 
eight. 

Designs where only one treatment is involved in 
other designs of the network (for example design 6:7 
in Figure 2) or where the removal of Sd would lead 
to a splitting of the network (for example design 
3:4 in Figure 2) do not contribute to the inconsis- 
tency assessment and are not incorporated into the net 
heat plot. 

For the arrangement of the rows and columns of the 
plotted matrix, we use the sum of the absolute distances 
between the rows and the absolute distances between 
the columns of (Q d F d )d f ,d=i,...,D f° r complete linkage 
clustering (see for example [34]). This results in col- 
ored block structures that potentially indicate hot spots 
of inconsistency. 
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In the plot we also draw gray squares, as shown in 
Figure lb), with areas proportional to the corresponding 
absolute elements of the hat matrix from equation (6). The 
larger the square is, the stronger the direct estimate of 
design d drives the network estimate of design d! . Conse- 
quently, a design d with large squared Pearson residuals 
Q^ c strongly influences design d' . The combination of 
the color for the inconsistency and the differently-sized 
squares results in the visual appearance of a halo that 
relays both types of information at the same time (see 
for example [35] for use of such halo visualizations in a 
different context). 



Further illustrative examples 

To illustrate the application of the net heat plot, 
we consider the network example from the previous 
sections and Figure 1 as well as four additional net- 
work meta-analysis examples with six treatments and six, 
eight, or all possible fifteen component meta-analyses 
based on two-armed studies (d in {1:2, 1:3, . . . , 5:6} with 
Nd = 2). These networks are displayed as graphs in 
Figures 3a) to e) on the left side, where the edges cor- 
respond to the different direct comparisons. The thick- 
ness of an edge represents the inverse standard error 
V d , which is equal to one for all d e {!,...,£)}. 



CD^-CDlfiCDCOCMCO K CvlCOLOCDCD^fCOCD 




(AJCDLOCO^t-CD^t-LOCDLOCDCDLOCO^t- 




Figure 3 Five illustrative network meta-analyses with net heat plot. In a) to e), the network design is shown on the left: six treatments and six, 
eight or fifteen different observed designs based on two-armed studies. The nodes are placed on the circumcircle and are labeled according to the 
treatments. The edges show which treatments are directly compared. The thickness of an edge represents the inverse standard error (V^ ir )~ ] / 2 , 
which is equal one for all designs. We introduced inconsistency by perturbing the effect of one edge (marked in red) by five standard errors of the 
direct effect estimate. The corresponding net heat plots are shown on the right side: The area of the gray squares displays the contribution of the 
direct estimate in design d (shown in the column) to the network estimate in design d! (shown in the row). The colors are associated with the 
change in inconsistency between direct and indirect evidence in design d' (shown in the row) after detaching the effect of design d (shown in the 
column). Blue colors indicate an increase and warm colors indicate a decrease (the stronger the intensity of the color, the stronger the change). 
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We have produced an inconsistent network of treat- 
ment effects by adding a 8 = 5 to one treatment 
effect 6j ir , while all other effects of the network remain 
zero. 

Because the network structures and the assumed pre- 
cisions of the direct effects are the same in scenar- 
ios a) to c), they share the same hat matrix, which 
is discussed in Section "Identifying drivers via the hat 
matrix" and illustrated in Figure lb). That is why 
the net heat plots in Figures 3a) to c) contain the 
same gray squares, just ordered differently due to 
the clustering. 

In scenario a), inconsistency is introduced through the 
treatment effect in design 1:2. The overall inconsistency 
statistic is Q inc = 9.17 (p = 0.027, see Table 2). In 
the net heat plot, the color intensities of the diagonal 
elements indicate that the squared Pearson residual for 
design 1:3 and especially the residuals for the designs 
1:2 and 2:3 almost solely contribute to Q inc . The latter 
ones have higher residuals, although their direct estimates 
drive their network estimates more strongly, with 63% 
in contrast to 53% in the case of design 1:3. This can 
be seen in the hat matrix elements that are displayed 
here by the area of the squares. The warm-colored off- 
diagonal elements in the column of design 1:2 or 2:3 are 
equal to the colors on the diagonal, which indicates a 
complete elimination of inconsistency in the whole net- 
work after relaxing design 1:2 or 2:3. This is also rec- 
ognizable by Q(i C 2 ) an d Q(2 C 3) m Table 2, each with a 
p value of one. A detachment of design 1:3 does not 
reduce all residuals but increases that of the designs 1:6, 
3:4, and 4:6, as indicated by the blue colors. Relaxing 
other designs causes only little change to the squared 
Pearson residuals. For example relaxing design 1:6 weakly 
reduces the residuals of design 1:2 and 2:3 but inflates 
the residuals of design 1:3 and increases the inconsis- 
tency in the whole network (p = 0.016 for Q^ c 6 ))- Due 
to the arrangement of the rows and columns in the plot 
(as explained in Section "The net heat plot"), we can 
see a hot spot of inconsistency between the effects of 
the component meta-analyses with designs 1:2, 2:3, and 
1:3 by the warm-colored block on the diagonal; how- 
ever, the effect of 1:3 is supported by other evidence of 
the network shown by the blue-colored elements in row 
and column of design 1:3. Altogether, designs 1:2 and 2:3 
can be identified as a source of inconsistency in the net- 
work. However, to be able to understand whether the 
effects of the component meta-analyses of both designs 
are the source or whether only one of them is, we need 
more network connectivity so that they are included 
solely in network loops. The squares in the columns of 
the two identified designs show that the corresponding 
treatment effects drive the network estimate of design 
1:3, which is therefore perturbed. Although attenuated, 



driving is also observed in designs 1:6, 3:4, and 4:6, as 
far as the influence of the effect in design 1:2 (and 2:3) 
is sufficient. 

In scenario b), we shifted the effect in design 1:6 anal- 
ogously to scenario a) by 8 = 5 from the rest of the 
network. This causes a Q inc of only 7.50 with a p = 
0.058, which is mainly composed of the squared Pearson 
residuals of designs 1:3 and 4:6 and especially of the 
residuals of designs 1:6 and 3:4. Contrasting the colors 
and the size of squares on the matrix diagonal shows 
that the latter two hold the strongest inconsistency con- 
tribution, although their corresponding direct estimates 
drive their network estimates the most strongly. In this 
scenario, a detachment of the effect in designs 1:6 or 
3:4 eliminates the inconsistency of the network. In con- 
trast, relaxing one of the designs 1:2, 2:3, 4:5, or 5:6 
only slightly reduces the inconsistency of the whole net- 
work (each Q^ 2) , ); ); Q™ 6) with p = 0.033), 
and a detachment of designs 1:3 or 4:6 even increases 
the inconsistency (each Q l ^ c 3 y Q 1 ^) with a p = 0.033). 
As well, in all six cases the squared Pearson residual 
of at least one other design is inflated. So in this sce- 
nario, we see a hot spot of inconsistency between designs 
1:6, 3:4, 1:3, and 4:6 by the intense warm-colored block 
on the diagonal (4 x 4). The strongest inconsistency is 
between the effect in designs 1:6 and 3:4. Weaker inconsis- 
tency can be observed between the effects in the designs 
1:2 and 2:3 as well between the effects in 4:5 and 5:6. 
The effects of designs 1:3 and 4:6 are supported by the 
evidence of the designs 1:2 and 2:3 as well as 4:5 and 
5:6 respectively. So in this scenario, designs 1:6 and 3:4 
can be identified as a plausible source of inconsistency, 
and analogous to scenario a), the inconsistency caus- 
ing edge 1:6 cannot be distinguished from the jointly- 
acting edges 3:4, although in this example these two 
are not adjacent edges. The squared Pearson residuals 
for the two identified designs, shown on the diagonal 
of the plot, are smaller in comparison to the residuals 
of the designs 1:2 and 2:3 in scenario a), although in 
both scenarios a perturbation is introduced with 8 = 5. 
This is because the corresponding network estimates are 
more strongly driven by their direct estimates with each 
70% and not only with 63% as in designs 1:2 and 2:3. 
The squares in the columns of the two identified designs 
indicate that they drive the network estimates in designs 
1:3 and 4:6 and, a little weaker, of the remaining other 
designs, which therefore differ from their direct estimates. 
Contrasting the colors and the size of squares on the off- 
diagonal elements of all 2 x 2 blocks on the diagonal 
implies that the weakest amount of treatment effect devi- 
ation is observed between the effects in designs 1:2 and 
2:3 as well as between the effects in 4:5 and 5:6, since 
the largest hat matrix elements are here as well the less 
intensive color. Altogether, the influence of the perturbed 
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Table 2 The inconsistency in the illustrative examples 

a) b) c) d) 



Q df p Q df p Q df p Q df p 

9.17 3 0.027 7.50 3 0.058 11.67 3 0.009 4.17 1 0.041 
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4.17 
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6.82 
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0.033 


6.82 
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0.00 


0 


5.36 
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0.069 


0.00 
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1 







Q'^ 0.00 2 1 6.82 2 0.033 6.82 2 0.033 0.00 0 1 

Q^ 3) 5.36 2 0.069 

O inc 
^(1:4) 

o inc 

Q^ 6) 8.33 2 0.016 0.00 2 1 8.33 2 0.016 0.00 0 1 

Q^ 3) 0.00 2 1 6.82 2 0.033 6.82 2 0.033 0.00 0 1 

O inc 
^(2:4) 

Qinc 
U (2:5) 
Qinc 
U (2:6) 

Q' ( 3^ 4) 8.33 2 0.016 0.00 2 1 8.33 2 0.016 0.00 0 1 

O inc 
U (3:5) 

o inc 

U (3:6) 

0.00 0 1 

0.00 0 1 

The Q statistic for inconsistency as well the Q statistic after detaching the effect of design d are given for each of the scenarios a) to e) in Figure 3. In addition, the 
degrees of freedom (df) of the corresponding chi-squared distribution and the p value (p) are shown. 
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0.054 



treatment effect in design 1:6 is more broad, but with 
overall weaker severity as the equally perturbed effect in 
scenario a). 

In scenario c), we changed the effect in design 1:3 
with 8 = 5 and found the highest network inconsis- 
tency statistic Q inc = 11.67 (p = 0.009) in comparison 
to both previous scenarios. The squared Pearson resid- 
ual for design 1:3 provides the largest contribution to 
the Q inc statistic. Smaller residuals are observed for the 
adjacent edges 1:2, 2:3, 1:6, and 3:4. A detachment of 
the effect in design 1:3 eliminates the inconsistency of 
the network. Relaxing other designs causes only a little 
change to the squared Pearson residuals and increases 
residuals for some designs. A hot spot of inconsistency 
can be seen between the effects in designs 1:3, 1:2, and 
2:3. However, the effect in design 1:2 is supported by the 
effects in designs 1:6, 3:4, and 4:6, and vice versa, the lat- 
ter ones are supported by the effects in design 1:2. The 
same holds for the effect in design 2:3 and the effects 
in the three designs. Altogether, edge 1:3 can be dis- 
tinctly identified as a plausible source of inconsistency 
since this is nested in two loops. The squared Pear- 
son residual for this design is higher in comparison to 
the residuals for the inconsistency-generating designs in 
the previous two scenarios, although in all scenarios an 



equally strong perturbation is introduced. This is because 
1:3 is the least self-driving design. Since the effect of 
design 1:3 strongly drives the network estimates of the 
designs 1:2, 2:3, 1:6, and 3:4, they are also influenced by 
the perturbation. 

In scenario d), we analyze a sparsely connected network 
that forms one loop. In such a network with observed 
inverse standard errors being the same for each direct esti- 
mate, all corresponding network estimates are composed 
83% of its own and 17% balanced of all other direct esti- 
mates. So, in the net heat plot we see only large squares 
on the diagonal. A perturbation of the effect at design 1:2 
results in a network inconsistency statistic of Q inc = 4.17 
(p = 0.041), which is the sum of equally-sized squared 
Pearson residuals. A detachment of any design interrupts 
the loop and flow of evidence so that the network esti- 
mates correspond, if existing, to their direct estimates and 
the inconsistency of the network is dissolved. In this sce- 
nario, we can recognize inconsistency but cannot locate 
its source since we have insufficient degrees of freedom. 
Nevertheless, several indirect estimates were affected by 
the perturbation of design 1:2. 

In network scenario e), all fifteen possible pairwise com- 
parisons are observed with same precision. Because of this 
tight linkage, each network estimate is driven one-third 
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by its corresponding direct estimate. The remaining two- 
thirds of indirect estimation is based on all eight adjacent 
edges in a balanced way. The disturbance of the network 
consistency by adding a 8 = 5 to treatment effect 
does not produce as much inconsistency in the whole net- 
work as seen in the other scenarios (Q inc = 16.67 with 
p — 0.082). Almost exclusively, the squared Pearson resid- 
ual for design 1:2 is increased so that a detachment of 
design 1:2 eliminates the inconsistency. A detachment of 
one of the eight adjacent edges causes only a little change 
and even weakly increases the inconsistency in the whole 
network, which results each time in a p value of 0.075. In 
the case of non-adjacent edges, the p values correspond- 
ing to Q 1 ^ are even 0.054. So in this scenario, the source of 
inconsistency is uniquely identifiable in the net heat plot, 
even more easily compared to scenario c). It only weakly 
drives and affects the network estimates of its adjacent 
edges so that the perturbation of the effect in design 1:2 
has only a little influence on the network. 

The examples show that perturbation of a single design 
may have side effects on residuals, more or less spread 
out in the network. Our clustering proved successful in 
grouping together designs with interrelated residuals that 
were simultaneously affected by one perturbation. The 
resulting hot spots facilitate the identification of sources of 
inconsistency, which may or may not be uniquely identi- 
fiable. While related large residuals are obviously grouped 
together, it may also occur that large residuals emerg- 
ing from two independent perturbations are also grouped 
in proximity. In this case we expect to find two diago- 
nal blocks, each signaling the local side effects of one 
perturbation and each representing one hot spot of incon- 
sistency. 

Software 

We implemented our methods in the open-source statisti- 
cal environment R [36]. While multivariate meta-analysis 
for the aggregation step of studies with the same design 
can be carried out using standard statistic software 
[28,37], we provide a preliminary stand-alone R func- 
tion for the net heat plot available on the website http:// 
www.unimedizin-mainz.de/fileadmin/kliniken/imbei/ 
Dokumente/Biometrie/Software/netheat.R. An R package 
is in preparation and will be available from the standard 
CRAN repository for the R environment. 

Results 

An example of a network meta-analysis in diabetes 

We applied our methods to a network meta-analysis 
example by Senn at al. [17]. They examined the con- 
tinuous outcome of blood glucose change according 
to the marker HbAlc in patients with type two dia- 
betes after adding one treatment out of ten differ- 
ent groups of glucose-lowering agents to a baseline 



migl benf 




rosi piog 

Figure 4 Network design in the diabetes example. The nodes are 
placed on the circumcircle and are labeled according to the 
treatments. The edges display the observed treatment comparisons. 
The thickness of the edges is proportional to the inverse standard 
error of the treatment effects, aggregated over all studies including 
the two respective treatments. The network includes 25 two-armed 
studies on fourteen different designs and one three-armed study of 
design placacanmetf. 



sulfonylurea therapy. As effect measures, we consider 
mean differences. 

The ten different treatment groups are abbreviated 
as follows by their first four letters: acar: Acarbose, 
benf: Benfluorex, metf: Metformin, migl: Miglitol, plac: 
Placebo, piog: Pioglitazone, rosi: Rosiglitazone, sita: 
Sitagliptin, SUal: Sulfonylurea alone, vild: Vildagliptin. 
This network meta-analysis involved 26 randomized 
controlled trials including one three-armed trial for 
placacanmetf and 15 different designs, of which ten are 
used in only one study. In the network, 15 out of 45 pos- 
sible different pair-wise contrasts are observed, of which 
eight involve a placebo (see Figure 4). 

Across the entire network (analogues to the result of 
Senn at al. [17]) as well as for exclusively within designs, 
we observed heterogeneity with p values < 0.001 (see 
Table 3). Regarding the Q^ et statistics, the component 
meta-analyses with designs placbenf, placmetf, placmigl, 
and, placrosi contribute the most to the heterogeneity 
within designs. 

To have a closer look at the inconsistency of the net- 
work, we use the net heat plot in Figure 5. Studies with 
design placbenf, placmigl, placsita, or placvild are not 
included in this plot because they do not contribute to 
the inconsistency assessment. There are direct treatment 
effects that strongly drive other network estimates in a 
consistent manner. For example, the treatment effects 
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Table 3 Heterogeneity and inconsistency in the diabetes example 





Q statistic 




Number of studies 


Degrees of freedom 


p value 


Q net 




96.98 
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Qinc 
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The decomposition of the Q statistics as well as the degrees of freedom of the corresponding chi-squared distributions and the p values are shown. In addition the 
considered number of studies are displayed. Only one study is observed for the following designs: placacar, placpiog, placsita, placvild, acanSUal, metf:piog, 
metf:SUal, piog:rosi, roskSUal, placacanmetf. For this reason, the corresponding G>J} et statistics are not shown. 



in designs placacar and acanSUal agree with the exist- 
ing direct evidence of each other, but we observe a Q inc 
statistic with a p value of 0.002, which is composed of 
the squared Pearson residuals for the designs metf: SUal, 
roskSUal, placpiog, metf:piog, and placrosi. The first two 
have higher residuals in comparison to placpiog, although 
their direct estimates more strongly drive their network 
estimates, with 56% and 41% in contrast to 36% in the 
case of design placpiog. We can observe a hot spot of 



inconsistency between the effects in designs metf:SUal, 
roskSUal, placpiog, and metf:piog, for which only one 
study is observed in each case. The effects in designs 
placpiog and metf:piog as well as, in particular, the 
designs metf:SUal and roskSUal are especially inconsis- 
tent. Although the direct estimate in design placrosi is 
hampered with large heterogeneity (p = 0.001), it has 
a large evidence base of six studies and hence strongly 
drives its network estimate with 83% and other network 
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Figure 5 Net heat plot in the diabetes example. The area of the gray squares displays the contribution of the direct estimate in design d (shown 
in the column) to the network estimate in design d' (shown in the row). The colors are associated with the change in inconsistency between direct 
and indirect evidence in design d' (shown in the row) after detaching the effect of design d (shown in the column). Blue colors indicate an increase 
and warm colors indicate a decrease (the stronger the intensity of the color, the stronger the change). The two contrasts of the three-armed study 
with design placacanmetf are marked with *. 
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Figure 6 Net heat plot in the diabetes example after exclusion of the study with design roshSUal. The area of the gray squares displays the 
contribution of the direct estimate in design d (shown in the column) to the network estimate in design d! (shown in the row). The colors are 
associated with the change in inconsistency between direct and indirect evidence in design d' (shown in the row) after detaching the effect of 
design d (shown in the column). Blue colors indicate an increase and warm colors indicate a decrease (the stronger the intensity of the color, the 
stronger the change). The two contrasts of the three-armed study with design placacanmetf are marked with *. 



estimates as well Note, that the contribution of single 
studies is easily disclosed by splitting the amount of 83% 
into a sum according to the inverse variances of the esti- 
mates of each study (83% = 11% + 18% + 20% + 22% + 
4% + 8%). A detachment of the corresponding design 
reduces the residuals of design metf:SUal, roskSUal, and 
placpiog, but inflates the residuals of design metf:rosi and 
piog:rosi. Overall, a detachment of the effects for each 
of the five inconsistent component meta-analyses men- 
tioned increases the squared Pearson residuals for some 
other designs in the network and results in blue entries in 
the plot. 

The strongest reduction in the whole network incon- 
sistency is achieved with a detachment of the effect 
in design roskSUal In this case, the net heat plot in 
Figure 6 results. The inconsistency between the effects 
in designs placpiog and metf:piog remains, but in an 
attenuated form. Now, the effect of design metf:SUal 
is inconsistent with the effect of the designs placacar 
and acanSUal, which were supported by the effect in 
design roskSUal in the previous version of the network. 
However, with a p value of 0.342 for the Q inc statistic, 
there is no longer strong evidence for inconsistency. The 
hot spot of inconsistency detected included designs with 
only one study. Indeed, one or a few biased studies may 



either cause heterogeneity when paralleled by other stud- 
ies of the same design (which is observed within the 
placrosi studies) or may cause inconsistency when solely 
representing a design. 

An example of a network meta-analysis in antidepressants 

Cipriani et al. [12] performed a network meta-analysis 
to examine the efficacy between twelve new-generation 
antidepressants as monotherapy for the acute-phase treat- 
ment of major depression. The twelve antidepressants are 
abbreviated as follows: bupr: Bupropion, cita: Citalopram, 
dulo: Duloxetine, esci: Escitalopram, fluo: Fluoxetine, fluv: 
Fluvoxamine, miln: Milnacipran, mirt: Mirtazapine, paro: 
Paroxetine, rebo: Reboxetine, sert: Sertraline, venl: Ven- 
lafaxine. The efficacy was defined as a reduction of at 
least 50% from the baseline depression rating score after 8 
weeks. For the network meta-analysis, they involved 111 
randomized controlled trials including two three-armed 
trials of design fluo:paro:sert. In these studies, 42 of 66 
possible pair-wise contrasts between the 12 treatments are 
observed (see Figure 7) in D = 43 different designs, of 
which 16 are observed in only one study. 

Analogous to [12], we used log odds ratios as effect 
measures, but for combining study estimates we used the 
fixed-effects model (5) instead of a random-effects model 
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paro 

Figure 7 Network design in the antidepressants example. The 

nodes are placed on the circumcircle and are labeled according to 
the treatments. The edges display the observed treatment 
comparisons. The thickness of the lines is proportional to the inverse 
standard error of the treatment effect, aggregated over all studies 
including these two respective treatments. The network includes 1 09 
two-armed studies with 42 different designs and two three-armed 
studies, both with design fluo:paro:sert. 



within the Bayesian framework. The treatment effects and 
respective standard errors of our model are very simi- 
lar to the results of Cipriani et al. [12], and the standard 
errors are not systematically smaller as could be expected, 
because we observed only little heterogeneity in the whole 
network (p = 0.113) as well as within designs (p = 0.125) 
and no significant inconsistency (p = 0.293). This results 
from the calculated Q statistics corresponding to Section 
"Decomposition of Cochran's Q" (see Table 4). Regarding 
the heterogeneity within the designs, only the two studies 
with design paro:sert are conspicuous, with a p value of 
0.006. 

The net heat plot presented in Figure 8 provides a 
detailed assessment of the slight inconsistency in this 
quite tightly connected network. As seen from the color 
on the diagonal of the plot, the squared Pearson residu- 
als for designs cita:esci, cita:paro, fluo:bupr, and mirt:venl 
contribute the most to Q inc . There is a small hot spot 
of inconsistency between the effects in designs cita:esci 
and cita:paro as well as between the effects in fluo:bupr 
and bupnsert. The largest squared Pearson residual is 
observed for design cita:esci, although the direct esti- 
mate in this design drives the corresponding network 
estimate comparatively strongly with 51% (maximum self- 
driving is observed in design dulo:esci with 61%). In 
contrast to the other four designs mentioned, the direct 



estimate of cita:esci also strongly drives network esti- 
mates for some other designs in the network, which can 
be seen from the square sizes in the corresponding col- 
umn. A detachment of the effect in design cita:esci results 
in the strongest reduction of the inconsistency in the 
whole network (resulting in Q™ ta . esci ^ = 29.6 with p — 
0.591). While the direct evidence contributes more than 
50% of the network estimate of this contrast, the direct 
estimate is larger than the network estimate (log odds 
ratio 0.39 vs. 0.17), and publication bias may be affect- 
ing the former one. The squared Pearson residuals for 
the designs cita:paro, cita:mirt, esckparo, and escksert 
are particularly reduced. In contrast, the direct treatment 
effects of designs fluo:venl and fluo:paro have the small- 
est standard error and drive the network estimates of 
many other designs (see large squares in the correspond- 
ing columns in Figure 8); however, a detachment of one 
of these designs causes only small changes in the squared 
Pearson residuals. 

Discussion 

To ensure the validity and robustness of the conclu- 
sion from a network meta-analysis, it is important to 
assess the consistency of the network and the contribu- 
tion of each component meta-analysis to the estimates. 
Our intention was to develop a sensitivity analysis tool 
that allows the identification of which component meta- 
analyses drive which network estimates and to locate the 
drivers that may have generated a hot spot of inconsis- 
tency. The net heat plot serves both purposes simulta- 
neously: the first one by graphically showing elements 
of the hat matrix and the latter one by colored block 
structures in the plot. We have shown that the net heat 
plot allows the identification of a single deviating design 
that induces inconsistency in artificial examples. In the 
case of stronger network connectivity, increased loca- 
tion specificity might be possible. In networks that only 
include one loop, it is not possible to trace inconsis- 
tency back to a single design, but designs that are part 
of several loops may be identifiable as a unique source 
for a hot spot of inconsistency. We also demonstrated 
the applicability of the plot in two published network 
meta-analyses. 

It is well known in regression diagnostics (see for 
example [29]) that the influence of an observation 
(on parameter estimates and prediction) is driven by both 
the respective residual and the diagonal element of the 
hat matrix. Analogous to classical meta-analyses, outlier 
effect estimates of single studies or a few highly-weighted 
studies play an important role, which can be inspected 
in forest plots. Influence measures are usually displayed 
as index plots with observation numbers on the horizon- 
tal axis; this has been successfully exploited for simple 
meta-analysis [21,22]. We felt that this is insufficient in 
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Table 4 Heterogeneity and inconsistency in the antidepressants example 

Q statistic Number of studies Degrees of freedom p value 

Q net 119.6 111 113-11=102 0.113 

Q inc 36.9 111 44-11=33 0.293 

Q het 82.7 111 113-44=69 0.125 

The Q statistic for the whole network, for inconsistency and for heterogeneity within designs are shown. In addition, the number of studies on which they are based, 
the degrees of freedom of the corresponding chi-squared distributions and the corresponding p values are displayed. 



network meta-analyses and thus proposed the net heat 
plot as an additional tool We display all elements of 
the hat matrix in the net heat plot and pointed out 
that the lines of the hat matrix are the linear coeffi- 
cients for a specific network estimate. As such they rep- 
resent the natural generalization of simple meta-analysis 
weights. They quantify the contribution of a component 
meta-analysis to the network-estimate of a given contrast 
and may therefore be of interest, even in a consistent 



network meta-analysis. Simultaneously, the changes in 
the squared Pearson residuals are visualized in the net 
heat plot after allowing for a deviating effect of one sin- 
gle component meta-analyses to detect outlying direct 
estimates. In passing, we have shown that Cochran's chi- 
squared statistic, the sum of squared Pearson residuals, 
can be generally used in network meta-analyses in a fixed- 
effects model framework to assess the heterogeneity of 
the whole network and can be decomposed to separate 
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Figure 8 Net heat plot in the antidepressants example. The area of the gray squares displays the contribution of the direct estimate in design d 
(shown in the column) to the network estimate in design d' (shown in the row). The colors are associated with the change in inconsistency between 
direct and indirect evidence in design d' (shown in the row) after detaching the effect of design d (shown in the column). Blue colors indicate an 
increase and warm colors indicate a decrease (the stronger the intensity of the color, the stronger the change). The two contrasts of the two 
three-armed trials with design fluo:paro:sert are marked with *. 
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out the inconsistency of the network. Particularly, we 
have shown how multi-armed studies can be incorporated 
both into the inconsistency chi-squared statistic via a 
quadratic form of Pearson residuals and into the net 
heat plot. 

Overall, inconsistency testing has also been discussed 
in large complex networks by comparing a consis- 
tency model with an unrestricted inconsistency model 
[10], as we have done in turn for each single compo- 
nent meta-analysis. However, the authors essentially only 
consider inconsistency between two-armed component 
meta-analyses because they do not analyze independent 
effects for multi-armed studies. We included multi-armed 
studies and thereby opened a way for dividing over- 
all heterogeneity exhaustively into heterogeneity within 
designs and inconsistency. Within a Bayesian framework, 
the authors discuss models with and without a random 
component for heterogeneity within component meta- 
analyses. We advocate the fixed-effects model, not only for 
the sake of simplicity but, more importantly in the diag- 
nostic framework, because it potentially provides a clearer 
picture and allows for better recognition and location 
of inconsistency. In contrast to testing loops for incon- 
sistency [6], which leads to redundant testing of many 
dependent hypotheses or is confined to simple networks 
composed of independent loops (as argued in [10]), our 
approach is applicable in large and complex networks. The 
approaches that capture inconsistency by a single extra 
variance component in a mixed effects model [8,9] only 
aim at quantifying inconsistency and at providing conser- 
vative confidence intervals. The assumptions are difficult 
to justify or falsify and, more importantly, the approach 
contains no straightforward way to locate inconsistency. 

The recently published design-by-treatment interaction 
model by [15,16] is most similar in spirit to our approach. 
In contrast to White et al. [16] and Higgins et al. [15], 
we do not include random effects for heterogeneity within 
designs. The advantage is that one or a few deviating or 
biased studies are treated equally, whether they are par- 
alleled by many other studies of the same design or are 
the sole representative of their design. In a random-effects 
model, the former studies would add to the heterogene- 
ity variance whereas the latter studies would inform fixed 
design-by-treatment interaction parameters. In a fixed- 
effects model, inconsistency is indicated by the Q statistic 
more sensitively than in the random-effects model of 
[16]. If heterogeneity or inconsistency is detected and not 
explained by single outliers, the model should be extended 
with study level covariates, along the lines explored by 
[6,38]. Ideally we should end up with a homogeneous 
model, thereby explaining rather than modeling hetero- 
geneity and inconsistency. 

However, the fact that failure to detect heterogene- 
ity does not constitute proof of homogeneity must be 



taken into account in the assessment of inconsistencies 
in network meta-analyses. This already holds for a simple 
meta-analysis and is even more relevant for network 
meta-analyses. In a network without loops, inconsistency 
cannot be detected at all. In this context, we point to the 
importance of the hat matrix. It allows for the assessment 
of the contribution of each component meta-analysis to 
a network estimate and directs attention to the crucial 
components. We have illustrated that often only a few 
components are important. 

Often, when inconsistency is observed, some compo- 
nent meta-analyses are heterogeneous, too. We point out 
that the inconsistency assessment is still valid in this con- 
text. However, then the direct effect estimates are no 
longer estimates of a single parameter, but are rather 
weighted averages of estimates of different parameters: 
the study-specific treatment effects. Nevertheless, incon- 
sistency assessment and the investigation of heterogeneity 
within component meta-analyses may interfere in this 
case, and it may be necessary to exclude single studies and 
repeat the net heat plot in order to find satisfactory expla- 
nations of overall heterogeneity. In fact, inspection of both 
coefficients (entries of the hat matrix) and of residuals was 
proposed by Senn et. al [17] at the study level, and this 
may be more appropriate if heterogeneity within designs 
is large. However, when applied at the study level, the net 
heat plot also has the additional advantage of pointing to 
influential studies, i.e. studies with large weight and large 
residuals. 

Heterogeneity and inconsistency can be broadly viewed 
as different aspects of heterogeneity, the latter being 
understood as any discrepancy between results of single 
studies and predictions based on a consistency model for 
a network. This fact is not only reflected in the decompo- 
sition of the Q statistic, but also underlines that our tools 
can be applied either at an aggregate level or at a study 
level. We presented the aggregate level approach here for 
its parsimony. The study level approach may be more 
appropriate, particularly if component meta-analyses are 
strongly heterogeneous. In fact, a visual display of the hat 
matrix at study level has been proposed and discussed 
in [17]. Another potentially viable approach would be to 
complement our tools at an aggregate level with ordinary 
forest plots for component meta-analyses. 

Some caution is due when interpreting a net heat 
plot. Different from usual regression diagnostics, a sin- 
gle component meta-analysis may stand for a large body 
of evidence in network meta-analyses. If a component 
meta-analysis is recognized as deviating from the rest or 
is identified as a major source of heterogeneity, it may or 
may not provide the more reliable part of the whole body 
of evidence. Song et al. [39] argued that sometimes the 
indirect part of evidence may be more reliable than the 
direct part. That is why tracking heterogeneity should only 
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be the starting point for focusing on subject matter details 
of component meta-analyses and, hopefully, single studies 
for finding subject matter reasons for the observed het- 
erogeneity, as argued by [40] for classical meta-analyses. 
In fact, this process of investigation was demonstrated in 
one example without using a formal tool to sort out incon- 
sistency; this was a simple inspection of squared Pearson 
residuals [17] and has been elaborated upon in worked 
examples (e.g. in [38,41]). In large and complex networks, 
we feel that the two step approach, separately investi- 
gating inconsistency and heterogeneity within designs is 
necessary in order to limit efforts. Furthermore it specifi- 
cally can answer whether a set of studies sharing the same 
design is influential. 

More than in classical regression diagnostics, there are 
model diagnostic challenges in network meta-analyses: 
Masking, a phenomenon already known, may be more 
pronounced here because we have inherently small 
numbers of observations: the component meta-analyses. 
Masking may occur if more than one observation devi- 
ates from the true model. In this case, parameter esti- 
mates are affected by outliers even after holding out one 
observation, and outliers may be obscured, i.e. masked 
[29] . To tackle this, we combined the technique of with- 
holding one observation with a graphical display. While 
this is clearly adequate if only one outlier exists, it may 
also facilitate the detection of more outliers. For a more 
rigorous approach, methods of holding out several obser- 
vations will have to be explored. The second problem, 
uniqueness, is particularly virulent in network meta- 
analyses: several component meta-analyses could be the 
explanation for all observed inconsistency. We discussed 
the extreme case of a circular network where inconsis- 
tency is completely unidentifiable. The ability to track 
down inconsistency to only one or at least a few com- 
ponent meta-analyses depends, as we illustrated, on the 
connectedness of the network. A lack of network con- 
nectivity can be useful for planning further studies, but 
the challenges for future research are twofold: find rules 
for the identifiability of deviating components and to 
find tools for economically displaying the ambiguity if it 
exists. 

Searching for influential component meta-analyses or 
influential studies is not the only way for responding to 
inconsistency and heterogeneity. As mentioned in [16] 
and worked out in [38], the consistency model can be 
extended to allow for (treatment by covariate interaction 
of) study level covariates, and the model extension can 
explain inconsistency and heterogeneity. Both approaches 
are complementary. Of note, the net heat plot could again 
be applied to an extended consistency model. 

One core component of our approach is to allow com- 
ponent meta-analyses to have deviating treatment effects. 
This idea of extending the model by relaxing parameter 



constraints is easily extended to generalized linear models 
for binary outcomes as well as to random-effects models. 
The approach is not confined to withholding the effects 
of one design, but is naturally applicable to allowing for 
an arbitrary number of designs to have specific deviat- 
ing effects, e.g. all designs containing a specific treatment. 
In all types of generalization, the challenge remains to 
perform these model relaxations in a systematic way and 
to provide tools to transparently display the multitude of 
results, for which our presented net heat can be a useful 
starting point. 

Conclusions 

We have illustrated the importance of assessing consis- 
tency in network meta-analyses, where, for example, one 
deviating component meta-analysis may induce a hot spot 
of inconsistency. As a tool for this task, we have devel- 
oped the net heat plot that displays drivers of the network 
estimates, plausible sources for inconsistency, and possi- 
ble disturbed network estimates, illustrating its usefulness 
in several artificial and real data examples. 
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